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Abstract 



'— ^ Differential privacy is a framework for privately releasing summaries of a database. Previous work 

, has focused mainly on methods for which the output is a finite dimensional vector, or an element of 

^ some discrete set. We develop methods for releasing functions while preserving differential privacy. 

f^ Specifically, we show that adding an appropriate Gaussian process to the function of interest yields 

r differential privacy. When the functions lie in the same RKHS as the Gaussian process, then the 

^'J correct noise level is established by measuring the "sensitivity" of the function in the RKHS norm. 

. As examples we consider kernel density estimation, kernel support vector machines, and functions in 

^^ reproducing kernel Hilbert spaces. 

1 Introduction 
> 

S^ Suppose we have database D which consists of measurements of a set of individuals. We want to release a 

^ summary of D without compromising the privacy of those individuals in the database. One framework for 

defining privacy rigorously in such problems is differential privacy |14[ [TT] . The basic idea is to produce 
an output via random noise addition. An algorithm which does this may be thought of as inducing a 
distribution Pq on the output space (where the randomness is due to internal "coin flips" of the algorithm) , 
for every input data set D. Differential privacy, defined in Section [2} requires that Pd not depend too 
strongly on any single element of the database D. 

The literature on differential privacy is vast. Algorithms that preserve differential privacy have been 
developed for boosting, parameter estimation, clustering, logistic regression, SVM learning and many 
other learning tasks. See, for example, [15j, [8j, |26], j9], [20], [16], [1|, and references therein. In all these 
cases, the data (both the input and output) are assumed to be real numbers or vectors. In this paper we 
are concerned with a setting in which the output, and possibly the input data set, consist of functions. 



A concept that has been important in differential privacy is the "sensitivity" of the output |T4j. In the 
case of vector valued output the sensitivity is typically measured in the Euclidean norm or the £i-norm. 
We find that when the output is a function the sensitivity may be measured in terms of an RKHS norm. 
To establish privacy a Gaussian process may be added to the function with noise level calibrated to the 
"RKHS sensitivity" of the output. 

The motivation for considering function valued data is two- fold. First, in some problems the data are 
naturally function valued, that is, each data point is a function. For example, growth curves, temperature 
profiles, and economic indicators are often of this form. This has given rise to a subfield of statistics 
known as functional data analysis (see, for instance p3J). Second, even if the data are not functions, we 
may want to release a data summary that is a function. For example, if the data di, . . . ,dn G M'^ are 
a sample from a distribution with density / then we can estimate the density with the kernel density 
estimator 






/(-) = ->.W^(^^)> XG 



where M^ is a kernel (see, for instance [27]) and h > is the bandwidth parameter. The density estimator 
is useful for may tasks such as clustering and classification. We may then want to release a "version" of 
the density estimator / in a way which fulfills the criteria of differential privacy. The utility of such a 
procedure goes beyond merely estimating the underlying density. In fact, suppose the goal is to release a 
privatized database. With a differentially private density estimator in hand, a large sample of data may 
be drawn from that density. The release of such a sample would inherit the differential privacy properties 
of the density estimator: see, in particular, [28j and [17J . This is a very attractive proposition, since 
a differentially private sample of data could be used as the basis for any number of statistical analyses 
which may have been brought to bear against the original data (for instance, exploratory data analysis, 
model fitting, etc). 

Histograms are an example of a density estimator that has been "privatized" in previous literature [281 
[To] . However, as density estimators, histograms are suboptimal because they are not smooth. Specifically, 
they do converge at the minimax rate under the assumption that the true density is smooth. The preferred 
method for density estimation in statistics is kernel density estimation. The methods developed in this 
paper lead to a private kernel density estimator. 

In addition to kernel density estimation, there are a myriad of other scenarios in which the result of 
a statistical analysis is a function. For example, the regression function or classification function from a 
supervised learning task. We demonstrate how the theory we develop may be applied in these contexts 
as well. 

Outline. We introduce some notation and review the definition of differential privacy in Section [2} 
We also give a demonstration of a technique to achieve the differential privacy for a vector valued output. 
The theory for demonstrating differentially privacy of functions is established in Section [3] In Section 
|4| we apply the theory to the problems of kernel density estimation and kernel SVM learning. We also 
demonstrate how the theory may apply to a broad class of functions (a Sobolev space). Section ^discusses 
possible algorithms for outputting functions. 



2 Differential Privacy 

Here we recall the definition of differential privacy and introduce some notation. Let D = (di, . . . , dn) £ 'D 
be an input database in which di represents a row or an individual, and where P is the space of all such 
databases of n elements. For two databases D, D' , we say they are "adjacent" or "neighboring" and write 
D ^ D' whenever both have the same number of elements, but differ in one element. In other words, 
there exists a permutation of D having Hamming distance of 2 to D' . In some other works databases are 
called "adjacent" whenever one database contains the other together with exactly one additional element. 
We may characterize a non-private algorithm in terms of the function it outputs, e.g., 0:2?—)- M . Thus 
we write 9£) = 0(D) to mean the output when the input database is D. Thus, a computer program which 
outputs a vector may be characterized as a family of vectors {9d '■ D G V}, one for every possible input 
database. Likewise a computer program which is randomized may be characterized by the distributions 
{Pd : D E V} it induces on the output space (e.g., M°') when the input is D. We consider randomized 
algorithms where the input is a database in D and the output takes values in a measurable space O 
endowed with the u-field A. Thus, to each such algorithm there correspond the set of distributions 
{Pd : D G D} on (il. A) indexed by databases. We phrase the definition of differential privacy using this 
characterization of randomized algorithms. 

Definition 2.1 (Differential Privacy). A set of distributions {Pd ■ D G P} is called (a, /3)-differentially 
private, or said to "achieve (a, /3)-DP" whenever for all D ~ -D' E P we have: 

PD{A)<e''PD'{A) + p, yA€A, (1) 

where a, /3 > are parameters, and A is the finest u-field on which all Pa are defined. 

Typically the above definition is called "approximate differential privacy" whenever /? > 0, and "(a, 0)- 
differential privacy" is shortened to "a-differential privacy." It is important to note that the relation D ~ 
D' is symmetric, and so the inequality (IT]) is required to hold when D and D' are swapped. Throughout 
this paper we take a < 1, since this simplifies some proofs. 

The (T-field A is rarely mentioned in the literature on differential privacy but is actually quite im- 
portant. For example if we were to take A = {0,, 0} then the condition (II| is trivially satisfied by any 
randomized algorithm. To make the definition as strong as possible we insist that A be the finest available 
cr-field on which the Pd are defined. Therefore when Q is discrete the typical ci-field is ^ = 2 (the class 
of all subsets of il), and when il is a space with a topology it is typical to use the completion of the 
Borel cr-field (the smallest u-field containing all open sets). We raise this point since when O is a space 
of functions, the choice of cr-field is more delicate. 

2.1 Differential Privacy of Finite Dimensional Vectors 



give a technique to achieve approximate differential privacy for general vector valued outputs in 
which the "sensitivity" may be bounded. We review this below, since the result is important in the 
demonstration of the privacy of our methods which output functions. What follows in this section is a 
mild alteration to the technique developed by p2] and [18] , in that the "sensitivity" of the class of vectors 
is measured in the Mahalanobis distance rather than the usual Euclidean distance. 

In demonstrating the differential privacy, we make use of the following lemma which is simply an 
explicit statement of an argument used in a proof by |12j . 



Lemma 2.2. Suppose that, for all D ~ D' , there exists a set A^ j-,, £ A such that, for all S €z A, 

S C A*o^D, => PDiS) < e'^PD'iS) (2) 

and 

PD(^:b,D')>i-/3. (3) 

Then the family {Pd} achieves the (a, f3)-DP. 
Proof Let S £ A. Then, 

Pd{S) = Pd{S n A*) + Pd{S n A*^) < Pd{s n A*) + 13 

< e''PD'{SnA*) + f3<e''PD'{S) + l3. 

The first inequahty is due to Q, the second is due to ^ and the third is due to the subadditivity of 
measures. D 

The above result shows that, so long as there is a large enough (in terms of the measure Pd) set on which 
the (a, 0)-DP condition holds, then the approximate (a, /3)-DP is achieved. 

Remark 2.3. If {Q,A) has a cr-finite dominating measure A, then for pi) to hold a sufficient condition is 
that the ratio of the densities be bounded on some set A^ j^, : 

VaGyll,,^,:^(a)<e"^(a). (4) 

This follows from the inequality 

Pd{S) = I ^(a) dX{a) < J e"^(«) dX{a) = c^Pd'^S). 

In our next result we show that approximate differential privacy is achieved via Q when the output 
is a real vector, say vd = v^D) G M , whose dimension does not depend on the database D. An example 
is when the database elements di G M"' and the output is the mean vector v{D) = n~^ X^ILi ^«- 

Proposition 2.4. Suppose that, for a positive definite symmetric matrix M G M , the family of vectors 
{vD ■■ D eV} C M'^ satisfies 

sup \\M~^/\VD - VD')\\2 < /^. (5) 

Dr^D' 

Then the randomized algorithm, which, for input database D outputs 

VD=VD + ^^^Z, Zr^Md{0,M) 

a 
achieves {a, (3) -DP whenever 



c(/3)> j21og|. (6) 



Proof. Since the Gaussian measure on M admits the Lebesgue measure A as a ir-finite dominating measure 
we consider the ratio of the densities 

This ratio exceeds e" only when 

2x^M-^{vD - vd') + vLm-^vd' - vJ)M-^VD > 2^^ . 

a 

We consider the probabiUty of this set under P^, in which case we have x = vd + M^''^z, where z 

is an isotropic normal with unit variance. We have 



c(/3)A r,. 1/2/ N c(/3)2A2 1 ,t.. i 

a a"^ 2 



'-z'^M-^/\vD - VD') > ^^ -{vD - VD'fM-\vD " Vd')- 



Multiplying by ^rfv^ and using l\5v gives 

c(/3)A aA 



z^M-^/^(vD-VD') > 



a 2c(/3) ' 



Note that the left side is a normal random variable with mean zero and variance smaller than A^. The 
probability of this set is increasing with the variance of said variable, and so we examine the probability 
when the variance equals A^. We also restrict to a < 1, and let y ~ AA(0, 1), yielding 

^/^ T,. 1/2/ N c(/3)A aA \ ^/ c(/3)A aA \ 



a 

1 



< P, 



where c(/3) is as defined in (rol) and the final inequality is proved in [TT]. Thus lemma 2.2 gives the 



differential privacy. D 

Remark 2.5. The quantity ([5| is a mild modification of the usual notion of "sensitivity" or "global sensi- 
tivity" [14J. It is nothing more than the sensitivity measured in the Mahalanobis distance corresponding 
to the matrix M. The case M = I corresponds to the usual Euclidean distance, a setting that has been 
studied previously by [18], among others. 

2.2 The Implications of Approximate Differential Privacy 

The above definitions provide a strong privacy guarantee in the sense that they aim to protect against an 
adversary having almost complete knowledge of the private database. Specifically, an adversary knowing 
all but one of the data elements and having observed the output of a private procedure, will remain unable 



to determine the identity of the data element which is unknown to him. To see this, we provide an analog 
of theorem 2.4 of [28], who consider the case of a-differential privacy. 

Let the adversary's database be denoted by Da = ((ii, . . . , (in_i), and the private database hy D = 
(di, . . . , dn)- First note that before observing the output of the private algorithm, the adversary could 
determine that the private database D lay in the set {(di, . . . , dn-i,d) G V} . Thus, the private database 
comprises his data with one more element. Since all other databases may be excluded from consideration 
by the adversary we concentrate on those in the above set. In particular, we obtain the following analog 
of theorem 2.4 of p8]. 

Proposition 2.6. Let X ~ P/j where the family Pd achieves the {a, 13) -approximate DP. Any level 'j test 
of: H : D = Dq vs V : D ^ Dq has power bounded above by "ye" + /3. 

The above result follows immediately from noting that the rejection region of the test is a measurable 
set in the space and so obeys the constraint of the differential privacy. The implication of the above 
proposition is that the power of the test will be bounded close to its size. When a, /3 are small, this means 
that the test is close to being "trivial" in the sense that it is no more likely to correctly reject a false 
hypothesis than it is to incorrectly reject the true one. 

3 Approximate Differential Privacy for Functions 

The goal of the release a function raises a number of questions. First what does it mean for a computer 
program to output a function? Second, how can the differential privacy be demonstrated? In this section 
we continue to treat randomized algorithms as measures, however now they are measures over function 
spaces. In section [5] we demonstrate concrete algorithms, which in essence output the function on any 
arbitrary countable set of points. 

We cannot expect the techniques for finite dimensional vectors to apply directly when dealing with 
functions. The reason is that cr-finite dominating measures of the space of functions do not exist, and, 
therefore, neither do densities. However, there exist probability measures on the spaces of functions. 
Below, we demonstrate the approximate differential privacy of measures on function spaces, by considering 
random variables which correspond to evaluating the random function on a finite set of points. 

We consider the family of functions over T = M.'^ (where appropriate we may restrict to a compact 
subset such as the unit cube in d-dimensions): 

{fD:DGV}c M^. 

A before, we consider randomized algorithms which on input D, output some fo ~ Pd where Pd is 
a measure on R-^ corresponding to D. The nature of the u-field on this space will be described below. 

3.1 Differential Privacy on the Field of Cylinders 

We define the "cylinder sets" of functions (see [6]) for all finite subsets S = (xi, . . . ,Xn) of T, and Borel 
sets B of R"" 

Cs,B = {/ G M^ : (/(xi), . . . , f{xn)) G B] . 

These are just those functions which take values in prescribed sets, at those points in S. The family of 
sets: Cs = {Cs,B '■ B G B(UP)} forms a cr-field for each fixed S, since it is the preimage of B(W^) under 

6 



the operation of evaluation on the fixed finite set S. Taking the union over all finite sets S yields the 
collection 

-^0= U ^^• 

S:|S'|<oo 

This is a field (see [6] page 508) although not a a-field, since it does not have the requisite closure under 
countable intersections (namely it does not contain cylinder sets for which S is countably infinite). We 
focus on the creation of algorithms for which the differential privacy holds over the field of cylinder sets, 
in the sense that, for all D ^ D' £ T>, 

P{jD^A)<e^P{jD'eA)+p, V^GJo- (7) 



This statement appears to be prima facie unlike the definition (llj), since J-q is not a cr- field on R"^. 
However, we give a limiting argument which demonstrates that to satisfy ([7]) is to achieve the approximate 
(a, /3)-DP throughout the generated cr-field. First we note that satisfying M implies that the release of 
any finite evaluation of the function achieves the differential privacy. Since for any finite S C T, we have 
that Cs C J-Q, we readily obtain the following result. 

Proposition 3.1. Let xi,...,Xn be any finite set of points in T chosen a-priori. Then whenever (M) 
holds, the release of the vector 

{lD{xi),...jD{xn)j 

satisfies the {a,(3)-DP. 

Proof. We have that 

Pd ((7(xi), . . . , 7(x„)) G ^) = Poife q.„...,.„},A). 

The claimed privacy guarantee follows from ([7]). D 

We now give a limiting argument to extend ([7]) to the generated ir-field (or, equivalently, the ci-field 
generated by the cylinders of dimension 1) 

:F'^'a{To) = [jCs 
s 

where the union extends over all the countable subsets 5* of T. The second equality above is due to [H] 
theorem 36.3 part ii. 

Note that, for countable S, the cylinder sets take the form 

oo 

Cs,B = {/ G M^ : fix,) eBi,i = l,2,. ..} = [] q.j,^,, 

1=1 

where i?j's are Borel sets of M. 

Proposition 3.2. Let ^^ hold. Then, the family {Pd : D e V} on (M^, J") satisfies for all D -^ D' e V: 

Pd[A) < e'^PD'iA) + /3, V.4 G T. (8) 



Proof. Define Cs,B,n = fXi=i^{ti},Bi- Then, the sets Cs,B,n form a sequence of sets which decreases 
towards Cs,b and Cs,b = liiUn^oo Cs,B,n- Since the sequence of sets is decreasing and the measure in 
question is a probabihty (hence bounded above by 1), we have 

Pd{Cs,b) = Pd{ hm Cs,B,n) = hm PD{Cs,B,n)- 

ji— ^-oo n— >oo 

Therefore, for each pair D ^ D' and for every e > 0, there exists an no so that for all n > uq 

\Pd{Cs,b) - PD{Cs,B,n)\ < e, \Pd'{Cs,b) - PD'{Cs,B,n)\ < e. 

The number no depends on whichever is the slowest sequence to converge. Finally we obtain 

PD{Cs,B)<PD{Cs,B,no)+e 

<e''PD'{Cs,B,no) + (3 + e 

Since this holds for all e > we conclude that Pd{Cs,b) < e°'PD'{Cs,B) + /?■ □ 

In principle, if it were possible for a computer to release a complete description of the function 
/d then this result would demonstrate the privacy guarantee achieved by our algorithm. In practise a 
computer algorithm which runs in a finite amount of time may only output a finite set of points, hence 
this result is mainly of theoretical interest. However, in the case in which the functions to be output 
are continuous, and the restriction is made that Pp are measures over C[0, 1] (the continuous functions 
on the unit interval), another description of the cr-field becomes available. Namely, the restriction of J-" 
to the elements of C[0, 1] corresponds to the borel cr-field over C[0, 1] with the topology induced by the 
uniform norm (||/||oo = supj/(t)|). Therefore in the case of continuous functions, differential privacy 
over Tq hence leads to differential privacy throughout the borel cr-field. 

In summary, we find that if every finite dimensional projection of the released function satisfies dif- 
ferential privacy, then so does every countable-dimensional projection. We now explore techniques which 
achieve the differential privacy over these <T-fields. 

3.2 Differential Privacy via the Exponential Mechanism 

A straightforward means to output a function in a way which achieves the differential privacy is to make 
use of the so-called "exponential mechanism" of [19j. This approach entails the construction of a suitable 
finite set of functions G = {gi, . . . ,gm} S I^^j in which every /d has a reasonable approximation, under 
some distance function d. Then, when the input is D, a function is chosen to output by sampling the set 
of G with probabilities given by 

-a 



Poigi) oc exp <^ —d{gi, fo) 

|19j demonstrate that such a technique achieves the a-differential privacy, which is strictly stronger than 
the (a, /3)-differential privacy we consider here. Although this technique is conceptually appealing for 
its simplicity, it remains challenging to use in practise since the set of functions G may need to be very 
large in order to ensure the utility of the released function (in the sense of expected error). Since the 



algorithm which outputs from Po must obtain the normahzation constant to the distribution above, it 
must evidently compute the probabilities for each ^j, which may be extremely time consuming. Note that 
techniques such as importance sampling are also difficult to bring to bear against this problem when it is 
important to maintain utility. 

The technique given above can be interpreted as outputting a discrete random variable, and fulfilling 
privacy definition with respect to the a-field consisting of the powerset of G. This implies the privacy 
with respect to the cylinder sets, since the restriction of each cylinder set to the elements of G corresponds 
some subset of G. 

We note that the exponential mechanism above essentially corresponded to a discretization of the 
function space M^. An alternative is to discretize the input space T, and to approximate the function 
by a piecewise constant function where the pieces correspond to the discretization of T. Thereupon the 
approximation may be regarded as a real valued vector, with one entry for the value of each piece of the 
function. This is conceptually appealing but it remains to be seen whether the sensitivity of such a vector 
valued output could be bounded. In the next section we describe a method which may be regarded as 
similar to the above, and which has the nice property that the choice of discretization is immaterial to 
the method and to the determination of sensitivity. 

3.3 Differential Privacy via Gaussian Process Noise 

We propose to use measures Pd over functions, which are Gaussian processes. The reason is that there is a 
strong connection between these measures over the infinite dimensional function space, and the Gaussian 



measures over finite dimensional vector spaces such as those used in Proposition 2.4 Therefore, with 
some additional technical machinery which we will illustrate next, it is possible to move from differentially 
private measures over vectors to those over functions. 

A Gaussian process indexed by T is a collection of random variables {Xt : t € T}, for which each 
finite subset is distributed as a multivariate Gaussian (see, for instance, [HE]). A sample from a Gaussian 
process may be considered as a function T — )• M, by examining the so-called "sample path" t — )• Xj. The 
Gaussian process is determined by the mean and covariance functions, defined on T and T^ respectively, 
as 

m{t) = EXi, K{s,t) = CoY{X„Xt). 

For any finite subset S G T, the random vector {Xt : t € S} has a normal distribution with the means, 
variances, and covariances given by the above functions. Such a "finite dimensional distribution" may 
be regarded as a projection of the Gaussian process. Below we propose particular mean and covariance 
functions for which Proposition |2.4| will hold for all finite dimensional distributions. These will require 
some smoothness properties of the family of functions {/d}- We first demonstrate the technical machinery 
which allows us to move from finite dimensional distributions to distributions on the function space, and 
then we give differentially private measures on function spaces of one dimension. Finally, we extend our 
results to multiple dimensions. 

Proposition 3.3. Let G be a sample path of a Gaussian process having mean zero and covariance function 
K. Let {fo : D G D} be a family of functions indexed by databases. Then the release of 

fo — JD-\ G 

a 



is (a, P)- differentially private (with respect to the cylinder a -field J^) whenever 



sup sup sup 

D~D' n<oo {xi,...,x„)£T" 



K{xi,xi] 



K{Xn,Xl] 



K{xi,Xn) 



iv (Xm Xfij 



'1/2 



foixi) - fD'{xi) 



foiXn) - fD'{Xn] 



< A. (9) 



Proof. For any finite set (xi, . . . , Xn) G T", the vector {G{xi), . . . , G{xn)) follows a multivariate normal 
distribution having mean zero and covariance matrix specified by Cov {G{xi),G{xj)) = K{xi,Xj). Thus 
for the vector obtained by evaluation of / at those points, differential privacy is demonstrated by Propo- 
2.4 since (JOJ) implies the sensitivity bound (pi). Thus, for any n < oo and any (xi, . . . , x„) G T" we 



sition 



Pd /(xi), . . . , /(x„) )€B)< e'^Po' /(xi), . . . , f{xn) G i? + /3 



have B eB{l 



Finally note that for any A £ J^q, we may write A = Cx„,B ^or some finite n, some vector X„ 
(xi, . . . , Xn) G T" and some borel set B. Then 

Poife A) = Pd ((f{xi), ..., f{xn)) G b) . 



Combining this with the above gives the requisite privacy statement for all yl G J-q- Proposition 3.2 
carries this to J-". D 



3.4 Functions in a Reproducing Kernel Hilbert Space 

When the family of functions lies in the reproducing kernel Hilbert space (RKHS) which corresponds to 
the covariance kernel of the Gaussian process, then establishing upper bounds of the form ^ is simple. 
Below, we give some basic definitions for RKHSs, and refer the reader to [5] for a more detailed account. 
We first recall that the RKHS is generated from the closure of those functions which can be represented 
as finite linear combinations of the kernel, i.e.. 



no=\Y,^^Kx, 



. j=l 



for some finite n and sequence ^j G M, Xj G T, and where K^ = K{x, •). For two functions / = X^"=i OiK^i 
and g = "^JLi ^j^Vj the inner product is given by 

n 

i = l j = lm 



and the corresponding norm of / is WfWu = ^/ifTlYu- This gives rise to the "reproducing" nature of the 
Hilbert space, namely, {Kx,Ky)y^ = K{x,y). Furthermore, the functional {Kx,-)y^ correspond to point 
evaluation, i.e. 

n 
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The RKHS % is then the closure of %q with respect to the RKHS norm. We now present the main 



theorem which suggests an upper bound of the form required in Proposition 3.3 



Proposition 3.4. For f £ Ti, where % is the RKHS corresponding to the kernel K, and for any finite 
sequence xi, . . . , x^ of distinct points in T , we have: 



( K{xi,xi] 



\ K{Xn,Xi] 



K{xi,Xn) 



J\ yXn, Xfi) 




< 



H- 



The proof is in the appendix. Together with Proposition 3.3, this result implies the following. 
Corollary 3.5. For {fo : D G V} C H, the release of 

ID — fD-\ (^ 

a 
is {a, P) -differentially private (with respect to the cylinder a -field) whenever 

A > sup WfD - fD'Wn- 

and when G is the sample path of a Gaussian process having mean zero and covariance function K, given 
by the reproducing kernel ofH. 



4 Examples 

We now give some examples in which the above technique may be used to construct private versions of 
functions in an RKHS. 



4.1 Kernel Density Estimation 

Let fo be the kernel density estimator, where D is regarded as a sequence of points Xj G T as i = 1, . . . , n 
drawn from a distribution with density /. Let h denote the bandwidth. Assuming a Gaussian kernel, the 
estimator is 



^''^''^ n(27r/i2)rf/2 ij 



exp 



\x — X. 



i\\2 



2/i2 



x£T. 



Let D ^ D' so that D' = xi, . . . , x„_i, x^ (no loss of generality is incurred by demanding that the 
data sequences differ in their last element). Then, 



ifD-fD')ix) 



1 



n(27r/i2)d/2 



exp 



X — X 



n\\2 



2/i2 



exp 



I / Ii2 

|x ■^n\\2 



2/l2 



If we use the Gaussian kernel as the covariance function for the Gaussian process then upper bounding 

{II l|2 ^ 

— 2fe2 f • Then fo — f£,i = 



11 



,„ ,9x^/9 (Kx — Kr' ) and 

||/d-/d'|Iw = ( ^(271-/^2)^/2 J (^(a;n,x„) + i^(x'„,x'„)-2K(x„,x;)) 



< 2 



1 ^2 



(27r/i2)rf/2 



n 



If we release 



~ _ , c(/3)^/2 



where G is a sample path of a Gaussian process having mean zero and covariance K^ then differential 



privacy is demonstrated by corollary |3.5[ We may compare the utility of the released estimator to that 
of the non-private version. Under standard smoothness assumptions on /, it is well-known (see [27]) that 
the risk is 

_C2_ 

n/i'^' 



i? = E [{fnix) - f{x)fdx = cih^ + 



for some constants ci and C2- The optimal bandwidth is /i x (l/n)^''''^'^'^^ in which case R = 0{n -i+d). 
For the differentially private function it is easy to see that 



E j\fD{x) - f{x)fdx = O (/i^ + ^) 



Therefore, at least in terms of rates, no accuracy has been lost. 

4.1.1 Non-Isotropic Kernels 

The above demonstration of privacy also holds when the kernel is replaced by a non-isotropic Gaussian 
kernel. In this case the kernel density estimate may take the form 



1 " f 1 1 

— — ^exp|--(x-Xi)^i7-^(x-Xi) L xGT, 



where H \s a, positive definite matrix and \H\ is the determinant. For example it may be required to 
employ a different choice of bandwidth for each coordinate of the space, in which case H would be a 
diagonal matrix having non-equal entries on the diagonal. So long as H is fixed a-priori, privacy may be 
established by adding a Gaussian process having mean zero and covariance given by 

K{x,y) = exp<^ - -{x - yf^ H~'^ {x - y) \ . 
As above, the sensitivity is upper bounded, as 

'^^"^^'"^-\n(2vr)'^/2|F|V2 
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Figure 1: An example of a kernel density estimator (the black curve) and the released version (the blue 
curve). This uses the method developed in Section 4.1, Here we sampled n = 100 points from a mixture 
of two normals centered at 0.3 and 0.7 respectively. We use h = 0.1 and have a = 1 and f3 = 0.1. The 
Gaussian Process is evaluated on an evenly spaced grid of 1000 points between and 1. Note that gross 
features of the original kernel density estimator remain, namely the two peaks. 



Therefore it satisfies the (a, /3)-DP to release 

fD = fD + 



mV2 



where G is a sample path of a Gaussian process having mean zero and covariance K. 

4.1.2 Private Choice of Bandwidth 

Note that the above assumed that h (or H) was fixed a-priori by the user. In usual statistical settings h 
is a parameter that is tuned depending on the data (not simply set to the correct order of growth as a 
function of n). Thus rather than fixed h the user would use h which depends on the data itself. Li order 
to do this it is necessary to find a differentially private version of h and then to employ the composition 
property of differential privacy (citation). 

The typical way that the bandwidth is selected is by employing the leave-one-out cross validation. 
This consists of choosing a grid of candidate values for h, evaluating the leave one out log likelihood for 
each value, and then choosing whichever is the maximizer. This technique may be amenable to private 
analysis via the "exponential mechanism" of (citation), however it would evidently require that T be a 
compact set which is known a-priori. An alternative is to use a "rule of thumb" (see [25]) for determining 
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the bandwidth which is given by 

3T4 IQR- 



^ \{d+l)nj 1.34 



In which IQRj is the observed interquartile range of the data along the j coordinate. Thus this method 
gives a diagonal matrix H as in the above section. To make a private version hj we may use the technique 
of [13] in which a differentially private algorithm for the interquartile range was developed. 



4.2 Functions in a Sobolev Space 

The above technique worked easily since we chose a particular RKHS in which we knew the kernel density 
estimator to live. What's more, since the functions themselves lay in the generating set of functions for that 
space, the determination of the norm of the difference fo — fo' was extremely simple. In general we may 
not be so lucky that the family of functions is amenable to such analysis. In this section we demonstrate 
a more broadly applicable technique which may be used whenever the functions are sufficiently smooth. 
Consider the Sobolev space 



^'[0, 1] = |/ G C[0, 1] : I {df{x)f d\{x) < ooj 



This is a RKHS with the kernel K{x,y) = exp{— 7 \x — y\} for positive constant 7. The norm in this 
space is given by 

Wffu = \ if (Of + fil)f) + ^ j\df{x)f + j^itf dX{t). (10) 

See e.g., [5| (p. 316) and [21] for details. Thus for a family of functions in one dimension which lay in 
the Sobolev space H^, we may determine a noise level necessary to achieve the differential privacy by 
bounding the above quantity for the difference of two functions. For functions over higher dimensional 
domains (as [0, l]'^ for some d > 1) we may construct an RKHS by taking the d-fold tensor product of 
the above RKHS (see, in particular [HI [3] for details on the construction). The resulting space has the 
reproducing kernel 

K{x,y) = exp{-7||x-y||i}, 
and is the completion of the set of functions 

go=[f: [0, 1]'^ ^ M : f{xi, ...,Xd) = fi{xi) ■ ■ ■ fd{xd)Ji G H^[0, 1]} . 
The norm over this set of functions is given by: 

d 

11/11^0= nii/^ii?^- (11) 

The norm over the completed space agrees with the above on Qq. The explicit form is obtained 



f{xi,...,Xj). T 



by substituting (10) into the right hand side of (11) and replacing all instances of Y\j=i fjixj) with 



lus the norm in the completed space is defined for all / possessing all first partial 



derivatives which are all in £2- 
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We revisit the example of a kernel density estimator (with an isotropic Gaussian kernel). We note 
that this isotropic kernel function is in the set Go defined above, as 



d ^ f / ^9^ d 



*.M = jdyrM-^^} = n vs-'j-^"^} = n*-'.(-^ 



j=l V — ■" V . j^-^ 



Where (/)^^/i is the isotropic Gaussian kernel on M with mean vector /i and 4>f_i^h is the Gaussian kernel 
in one dimension with mean fij . We obtain the norm of the latter one dimensional function by bounding 



the elements of the sum in (10) ad follows: 



(a0^^,,(x))2 dA(x) < / 9-=-e-(--^:')V2/.M dX{x) 



4^/^/l3 ' 



and 






where we have used the fact that 

27r/i 
Therefore, choosing 7 = \jh leads to 



2 ^ w™ r- md 



<\>i.,A^) < TTTT' ^^ e 



2 1 1 1 1 



and 



Wllw ^ 



(2vr)d/2/j2d ■ 
Finally, 

2 

Therefore, we observe a technique which attains higher generality than the ad-hoc analysis of the 
preceding section. However this is at the expense of the noise level, which grows at a higher rate as d 
increases. An example of the technique applied to the same kernel density estimation problem as above 
is given in Figure [2} 

4.3 Minimizers of Regularized Functionals in an RKHS 

The construction of the following section is due to [7j , who were interested in determining the sensitivity 
of certain kernel machines (among other algorithms) with the aim of bounding the generalization error 
of the output classifiers. [24J noted that these bounds are useful for establishing the noise level required 
for differential privacy of support vector machines. They are also useful for our approach to privacy in a 
function space. 
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Figure 2: An example of a kernel density estimator (the black curve) and the released version (the blue 
curve) . The setup is the same as in Figure [Tl but the privacy mechanism developed in Section |4.2| was 
used instead. Note that the released function does not have the desirable smoothness of released function 
from Figure [T| 

We consider classification and regression schemes in which the datasets D = {zi, . . . , z^} with zi = 
{xi,yi), where Xi G [0,1] are some covariates, and yi is some kind of label, either taking values on 
{—1, +1} in the case of classification or some taking values in some interval when the goal is regression. 
Thus the output functions are from [0, 1]"' to a subset of M. The functions we are interested in take the 
form 

fo = argmin - V i{g, Zi) + \\\g\\y, (12) 



g&H n 



z,&D 



where % is some RKHS to be determined, and l is the so-called "loss function." We now recall a definition 
from [7J (using M in place of their a to prevent confusion): 

Definition 4.1 (M-admissible loss function: see [7J). A loss function: i{g,z) = c{g{x),y) is called 
M-admissible whenever c it is convex in its first argument and Lipschitz with constant M in its first 
argument. 



We will now demonstrate that for (12), whenever the loss function is admissible, the minimizers on 



adjacent datasets may be bounded close together in RKHS norm. Denote the part of the optimization 
due to the loss function: 

LDif) = -TKf,z,). 



n 



Using the technique from the proof of lemma 20 of [7j we find that since (. is convex in its first argument 
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we have 

LdUd + ri5D',D) - LdUd) < viLoifD') - LdUd)), 

where r] G [0, 1] and we use 5d',d = Id' — Id- This also holds when Jd and fo' swap places. Summing 
the resulting inequality with the above and rearranging yields 

LdUd' - -ti^D'^d) - LdUd') < LdUd) - LdUd + 'n^D',D)- 
Due to the definition of fo , fo' ^-s the minimizers of their respective functionals we have 

LdUd) + \\\fD\\n < LdUd + vSd',d) + \\\fD + V^d',d\\h 
Ld'Ud') + M\fD'\\H < Ld'Ud' - V^D',d) + A||/d' - r]6D',D\\n- 

This leads to the inequalities 

> A {W/dWI - WfD + ¥D',Dfn + WfD'fn - Wfo' - ¥D',Dfn) 
+ LdUd) - LdUd + 'n^D',D + Ld'Ud') - Ld'Ud' - V^d',d) 
> 2\\\r]5D',D\\'iL - LdUd') + LdUd' - V^d',d) + Ld'Ud') - Ld'Ud' - V^d',d) 

= 2X\\7]6d',d\\h + I {i{z, fD') - £{z, JD' - r]SD',D) + i{z', fD') - i{z', fD' - vSd',d)) . 
Moving the loss function term to the other side and using the Lipschitz property we finally obtain that 

II/d - fD'Wn ^ T~ll'^^ ~ /d'IIoo- 

What's more, the reproducing property together with Cauchy-Schwarz inequality yields 

\fD{x) - fD'{x)\ = \Ud- fD',Kx)n I < II/d - fD'\\nVK{x,x). 
Combining with the previous result gives 

II/d -fD'Wn ^ 3^11-^^ -/D'|lw./supi^(x,x), 

which, in turn, leads to 

M I 

Vd-Id'Hh < x^w^^P-^(^'^)- 

For a soft-margin kernel SVM we have the loss function: i{g, z) = (1 — yg{x))^, which means the positive 
part of the term in parentheses. Since the label y takes on either plus or minus one, we find this to be 
1-admissible. An example of a kernel SVM in T = M^ is shown in Figure [3J 

5 Algorithms 

There are two main modes in which functions /d could be released by the holder of the data D to the 
outside parties. The first is a "batch" setting in which the parties designate some finite collection of 
points xi . . . ,x„ G T. The database owner computes fD{xi) for each i and return the vector of results. 
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Figure 3: An example of a kernel support vector machine. In the top image are the data points, with the 
colors representing the two class labels. The background color corresponds to the class predicted by the 
learned kernel svm. In the bottom image are the same data points, with the predictions of the private 
kernel svm. This example uses the Gaussian kernel for classification. 
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At this point the entire transaction would end with only the collection of pairs (xj, foixi)) being known 
to the outsiders. An alternative is the "online" setting in which outside users repeatedly specify points 
in Xi £ T, the database owner replies with fo{xi), but unlike the former setting he remains available to 
respond to more requests for function evaluations. We name these settings "batch" and "online" for their 
resemblance of the batch and online settings typically considered in machine learning algorithms. 

The batch method is nothing more than sampling a multivariate Gaussian, since the set xi, . . . ,Xn £ T 
specifies the finite dimensional distribution of the Gaussian process from which to sample. The released 
vector is simply 



foixi) \ f f fnixi) \ / K{xi,xi) ■■■ K{xi,Xn) 



■M 



c(/3)A 




foiXn) J \ \ fD{Xn) J \ K{Xn,Xi) ■■■ K{x 

In the online setting, the data owner upon receiving a request for evaluation at Xi would sample the 
gaussian process conditioned on the samples already produced at xi, . . . ,Xj_i. Let 

K{xi,Xi 
Ci=\ ■■ •.. : \, G,= \ : \ , Vi ' 






iv yXi—,\ , Xi 



Then, 



foixi) ~ Af {V^C-'Gi, K{xi, Xi) - V^Cr^Vi) 



The database owner may track the inverse matrix C^ and after each request update it into C^,-^^ by 
making use of Schurs Complements combined with the matrix inversion lemma. Nevertheless we note 
that as i increases the computational complexity of answering the request will in general grow. In the 
very least, the construction of Vi takes time proportional to i. This may make this approach problematic 
to implement in practise. However we note that when using the covariance kernel 

K{x,y) =exp{-7 |x-y|i} 



that a more efficient algorithm presents itself. This is the kernel considered in section 4.2, Due to the 
above form of K, we find that for x < y < z we have: K{x, z) = K(x, y)K{y, z). Therefore in using the 
above algorithm we would find that Vi is always contained in the span of at most two rows of Cj. This is 
most evident when, for instance, Xj < minj<j Xj. In this case let m = argminj<j Xj Vi = K{xi, Xm)Ci{m), 
in which Ci{m) means the m row of Cj. Therefore C~ Vi will be a sparse vector with exactly one non- 
zero entry (taking value K{x,Xm)) in the m}^ position. Similar algebra applies whenever Xj falls between 
two previous points, in which case Vi lays in the span of the two rows corresponding to the closest point 
on the left and the closest on the right. Using the above kernel with some choice of 7 let 

p(x,y) = e^l^-^l-e-^l^-^l. 

Let S,{xi) = fnixi) — /d(xj) represent the noise process. We find that the conditional distribution of £,{x.i) 
to be Normal with mean and variance given by: 



M{xi) 



i^(Xi,X(i))^(X(i)) Xi < X(i) 

K{xi,X(i_i))^{x(i_i)) Xi > X(j_i) 

P(^fa-+i),x.) , . p{x^,),x,) T,: < T- < r,- .. 

^ p(x(,.),x(,+i))?i2;o)j + p(^(^.),^,^.^,j)U2:q+i)J xq) < X, < XQ+1), 
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and 



Yarlfoixi)] = < 



1 



i^(x,X(i))2 Xj < a;(i) 

-ft:(x,X(j„i))2 Xj > X(j_i) 



where X(i) < X(2) < • • • < X(j_i) are the points xi, . . . ,Xj_i after being sorted into increasing order. In 
using the above algorithm it is only necessary for the data owner to store the values Xj and foixi)- When 
using the proper data structures e.g., a sorted doubly linked list for the Xj it is possible to determine 
the mean and variance using the above technique in time proportional to log(i) which is a significant 
improvement over the general linear time scheme above (note that the linked list is suggested since then 
it is possible to update the list in constant time). 



6 Conclusion 

We have shown how to add random noise to a function in such a way that differential privacy is preserved. 
It would be interesting to study this method in the many applications of functional data analysis [23] . 

On a more theoretical note, we have not addressed the issue of lower bounds. Specifically, we can ask: 
Given that we want to release a differentially private function, what is the least amount of noise that 
must necessarily be added in order to preserve differential privacy? This question has been addressed in 
detail for real-valued, count-valued and vector-valued data. However, those techniques apply to the case 
of /3 = whereupon the family {Pd} are all mutually absolutely continuous. In the case of /? > which 
we consider this no longer applies and so the determination of lower bounds is complicated (for example, 
since quantities such as the KL divergence are no longer bounded). 



7 Appendix 



Proof of Proposition 3.4| , Note that invertibility of the matrix is safely assumed due to Mercer's 



theorem. Denote the matrix by M ^ . Denote by P the operator T-L ^ T-L defined by 



n n 



i=l j=l 
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We find this operator to be idenipotent in the sense that P = P : 



2. 



n n / n n 



i=l j=i \ k=l 1=1 I u 



n n 

p In 



j=i j=i fc=i i=\ 



n n 



i=l j=l fc=l «=1 



n n 



= P. 

P is also self-adjoint due to the symmetry of M, i.e. 

In n 



\i=i j=i /^ 

In n \ 

Inn \ 

\i=i i=l / ^ 



n n 



\i=i i=i / ^ 



Therefore, 



n — U-.f)H 
= {Pf + {f-Pf),Pf + {f-Pf))^ 

= {Pf, Pf)^ + 2 {Pf, f - Pf)^ + {f-PfJ- Pf)^ 

= {Pf, Pf)^ + 2 (/, Pf - P^f)^ + {f-PfJ- Pf)^ 

= {Pf,Pf)^ + {f-PfJ-Pf)^ 

>{Pf,Pf)n 
= {f,Pf)n- 

The latter term is nothing more than the left hand side in the statement. In summary the quantity in the 
statement of the theorem is just the square RKHS norm in the restriction of Ti to the subspace spanned 
by the functions Kx^. □ 
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